Towards the Detection of Cross-Language Source Code Reuse
نویسندگان
چکیده
Internet has made available huge amounts of information, also source code. Source code repositories and, in general, programming related websites, facilitate its reuse. In this work, we propose a simple approach to the detection of cross-language source code reuse, a nearly investigated problem. Our preliminary experiments, based on character n-grams comparison, show that considering different sections of the code (i.e., comments, code, reserved words, etc.), leads to different results. When considering three programming languages: C++, Java, and Python, the best result is obtained when comments are discarded and the entire source code is considered.
منابع مشابه
Dwarf Frankenstein is still in your memory: tiny code reuse attacks
Code reuse attacks such as return oriented programming and jump oriented programming are the most popular exploitation methods among attackers. A large number of practical and non-practical defenses are proposed that differ in their overhead, the source code requirement, detection rate and implementation dependencies. However, a usual aspect among these methods is consideration of the common be...
متن کامل(CLSCR) Cross Language Source Code Reuse Detection Using Intermediate Language
In today's digital era information access is just a click away. so computer science students also have easy access to all the source codes from different websites thus it has become difficult for academicians to detect source code reuse in students programming assignments. The new trend in the area of source code reuse is using the source code by translating it in another programming language p...
متن کاملNormalization based Stop-Word approach to Source Code Plagiarism Detection
This paper is a report of PES Institute of Technology’s participation in the Cross Language Detection of Source Code Reuse (CL-SOCO) task at FIRE 2015 [1]. We approach this task as text document plagiarism task, without considering formal programming language grammatical structure. We use normalization of commonly used identifiers to detect pair of programs which have the same objective. We als...
متن کاملMainland Chinese Students’ Shifting Perceptions of Chinese-English Code-Mixing in Macao
As a former Portuguese colony, Macao is the only region in China where Cantonese, a variety of Chinese, and English, an international language, are enjoying de facto official statuses, with Putonghua being a quasi-official language and Portuguese being another official language. Recently, with an increasing number of Mainland Chinese students crossing the border to pursue their tertiar...
متن کاملSource Code Reuse Analysis in Multiple Projects based on the Clone Genealogy
In the software industry and OSS projects, it is said that source code reuse could improve productivity and reliability of software development, and reduce development time. On the other hand, source code reuse requires professional skills to developers. Ad-hoc reuse might introduce some maintenance problems. The source code reuse analysis for software development organizations is worthy to be ...
متن کامل